In humans, learning depends on the joint contribution of multiple interacting systems — memory (WM), long-term memory (LTM) and reinforcement learning (RL). The present study aims to understand the relative contributions of these systems during learning as well the specific strategies individuals might rely on. Collins (2018) put forward a working memory-reinforcement learning combined model that addresses this question but it largely ignores long-term memory. We built four ACT-R (single-mechanism RL and LTM, and two integrated RL-LTM, meta-learning RL and parameter RL bias models) idiographic learning models using the Collins (2018) stimulus-response association task. Different models provided best-fits (LTM: 63%, RL: 1%, meta-RL: 12%, bias-RL:21% of participants) for individual learners which suggests that irreducible differences in learning and meta-learning strategies exist within individuals. Models predicted learning accuracy and rate, and testing accuracy for subjects in their respective groups.
This report describes the four ACT-R models and the learning outcomes produced by the changes in parameters. The report also describes how these models fit behavioral data and details the properties of the best fitting models and parameters. The specific objectives of this project is to test if the RLWM task can be modeled well by a group of pure and combined declarative and RL learning models. After fitting the models to participant data we aim to extract parameters that may explain why and how learning resulted as observed. If the parameters describe individual differences in learning would the parameters predict other behavioral data like working memory capacity and reinforcement learning accuracy?
Below are the 4 ACT-R models tested. Note that the bolded names appear through-out this document.
RL: Pure RL model based on learning of production utility in ACT-R. learning rate (alpha) and softmax temperature are the only 2 parameters
LTM: A declarative model that solely depends on storage and retrieval of stimuli, response and outcome in ACT-R’s declarative memory. This model depends on decay rate, retrieval noise and
meta_RL: This is a combined RL - LTM model. Information about trials performed by the RL system is shared and stored in LTM (declarative) for use. An isolated (meta) RL system (a set of productions) learns and determines which sub-system, RL or LTM, is used throughout learning. Which subsystem is preferred depends on the specific set of parameters.
biased: This is a combined RL-LTM model. Information about trials performed by the RL system is not shared with the LTM portion of the model. An additional “strategy” parameters specifies a bias towards the RL model at the 20, 40, 60, and 80 percent of learning and test trials.
The models are fit to behavioral data and the best-fitting model and set of parameters is selected by comparing BIC. The lowest BIC value determines the winning model. To assess the quality of the fit model and parameters RLWM task learning features were compared to the model outcomes. The features of interest are: - Accuracy at the end of learning (accuracy after 12 stimulus presentations) - Accuracy at test - Change in accuracy from end of learning to test - Learning rate - Differences in the learning trajectories of the two set sizes The expectations and outcomes are described below.
Of the four models compared, the LTM model fit the most number of participants (57) followed by the biased version of the combined RL-LTM model (11) and the meta-RL combined model in third place (11). The RL only model had only 4 participant(s) that fit it best (figure 1). This is a slight departure from out expectation that the combined RL-LTM models would fit the majority of participants. As observed, this suggests that most learners simply commit to memory the stimulus response associations.
Figure 1. Counts of fit subjects by model
Within each group (groups formed by preferred model types) of participants, there is only 1 RL best fitting combination of parameter values for the alpha and softmax parameters. For the most popular model, LTM, that fit (57) participants, surprisingly, there were only 14 best fitting parameter-value sets for the spreading activation, retrieval noise and memory decay rate parameters. The biased model was the most diverse at 11 parameter sets for (11) participants. The meta-RL model closely followed the biased model in-terms of diversity of parameter-value sets at 10 parameter-value sets for (11) subjects. Figures 2 and 3 show the medians and ranges of the BIC values that determined that the LTM model is the best fitting model even when only comparing BIC values for the set of parameter-values that fit participants best in each category of models.
Figure 2.
Figure 2 shows that the LTM model has the lowest BIC values.
Figure 3.
How consistent are the fits observed above? Given a participants best fit how many of the next best fit parameter sets are in the same model category?
| model | mean | median | sd | min | max |
|---|---|---|---|---|---|
| Biased | 164.45455 | 146 | 153.95932 | 3 | 395 |
| LTM | 14.40351 | 9 | 12.13593 | 3 | 53 |
| Meta-RL | 131.54545 | 119 | 151.15116 | 2 | 443 |
| RL | 2.00000 | 2 | 0.00000 | 2 | 2 |
Figure 4
Figure 5
Figure 6
figure 6
| subjects | X1 | X2 | X3 | model |
|---|---|---|---|---|
| 6217 | 5.3616978 | 2.5667977 | 0.9835599 | RL |
| 15001 | 1.9317027 | 1.0882666 | 1.0623723 | Meta-RL |
| 15005 | 2.3491642 | 0.9896057 | 0.9199334 | Meta-RL |
| 15014 | 0.2556102 | 3.2446232 | 0.2976195 | RL |
| 15016 | 1.9222742 | 0.9258556 | 0.3815785 | Meta-RL |
| 28306 | 25.2416031 | 2.0253609 | 3.0521664 | RL |
| 28328 | 14.4697705 | 4.9703081 | 0.2965980 | RL |
Looking at the learning curves for the four models in Figure 4, the differences in learning rates are apparent as are other features like the separation between the two set sizes. In the plot below each data point is the average accuracy, for that number of stimulus presentations, across all parameter combinations. The LTM and RL models predict that an increase in set-size does not diminish learning rate and accuracy. But this analysis washes out the individual differences that could be captured by the diverse set of parameter combinations.
Figure 7.
The panels in figure 8 show the mean accuracy for participant behavioral data. The model lines are averages across parameters for that group only. As we are aiming for an individual differences look at these data, collapsing across so much of this variability is uninformative, as was shown above in figure 4,especially if the differences, once fit to actual behavioral data, indicate large differences in learning outcomes or cogntive faculty diagnostics like working memory capacity. Here, only the best fitting sets of parameter combinations were selected and collapsed. As can be seen in the figure below, the different model types appear to be vastly different and some charateristics of behavioral data have come through, such as the separations of the learning trajectories for the different setsizes in the RL-LTM Biased model fit. It can also be seen that some paramter sets in the LTM model also capture the diffculty associated with increasing set size (solid lines in Fig. 8B). The LTM participants, on average have the highest accuracies for the testing phase in both set sizes but they are nearly indistinguishable from the meta-RL group for accuracy at end of learning. The biased group shows the most separation between the set size 3 and 6 at learningand also lower accuracy at test than LTM. The biased group is negligibly different from the meta-RL group for set size 3 but shows a marked difference at set size 6, closely following the behavioral data.
Figure 8.
Figure 9
Figure 9
There are five outcome measures of interest in the RLWM task: accuracy at the end learning, accuracy at test, learning rate characterized as slope estimate for the first 6 trials, the differences in learning of set 3 and set 6 and also the level of preserved learning at test for both set-sizes (test-learn). The following analyses compare the model data with behavioral data in these outcome measures.
Figure 10 below shows accuracy at end of learning and test. The models closely track the behavioral data. Note that the RL group has only two data points.
| term | df | sumsq | meansq | statistic | p.value |
|---|---|---|---|---|---|
| setSize | 1 | 0.0000641 | 0.0000641 | 0.0048209 | 0.9446873 |
| iteration | 1 | 1.2908057 | 1.2908057 | 97.1443094 | 0.0000000 |
| setSize:iteration | 1 | 0.1081829 | 0.1081829 | 8.1416983 | 0.0046007 |
| Residuals | 328 | 4.3583024 | 0.0132875 | NA | NA |
| term | df | sumsq | meansq | statistic | p.value |
|---|---|---|---|---|---|
| setSize | 1 | 0.0000343 | 0.0000343 | 0.0031057 | 0.9555921 |
| iteration | 1 | 1.8562669 | 1.8562669 | 168.2184894 | 0.0000000 |
| setSize:iteration | 1 | 0.0248099 | 0.0248099 | 2.2483246 | 0.1347210 |
| Residuals | 328 | 3.6194330 | 0.0110349 | NA | NA |
The models predict learning rate for set size 3 for most of the models (not in the explicit biased model, too few data points in RL to say). But the models predicted learing rate for s6 only in the biased model. See figure 11 below.
| setSize | mean(estimate) | median(estimate) |
|---|---|---|
| s3 | 0.1148164 | 0.1154762 |
| s6 | 0.0800440 | 0.0825397 |
#>
#> Welch Two Sample t-test
#>
#> data: estimate by setSize
#> t = 10.149, df = 142.26, p-value < 2.2e-16
#> alternative hypothesis: true difference in means between group s3 and group s6 is not equal to 0
#> 95 percent confidence interval:
#> 0.02799973 0.04154511
#> sample estimates:
#> mean in group s3 mean in group s6
#> 0.11481641 0.08004399
| setSize | type | model | mean | se |
|---|---|---|---|---|
| s3 | behav | Biased | 0.1024892 | 0.0055733 |
| s3 | behav | LTM | 0.1161654 | 0.0021151 |
| s3 | behav | Meta-RL | 0.1181818 | 0.0062193 |
| s3 | behav | RL | 0.1202381 | 0.0054771 |
| s3 | model | Biased | 0.0746926 | 0.0083405 |
| s3 | model | LTM | 0.1068822 | 0.0008860 |
| s3 | model | Meta-RL | 0.1048312 | 0.0032958 |
| s3 | model | RL | 0.1247619 | 0.0000000 |
| s6 | behav | Biased | 0.0525253 | 0.0100787 |
| s6 | behav | LTM | 0.0840017 | 0.0029315 |
| s6 | behav | Meta-RL | 0.0763348 | 0.0054408 |
| s6 | behav | RL | 0.1095238 | 0.0084179 |
| s6 | model | Biased | 0.0669048 | 0.0078334 |
| s6 | model | LTM | 0.1042473 | 0.0007384 |
| s6 | model | Meta-RL | 0.0920563 | 0.0050243 |
| s6 | model | RL | 0.1327619 | 0.0000000 |
#> Analysis of Variance Table
#>
#> Response: estimate
#> Df Sum Sq Mean Sq F value Pr(>F)
#> type 1 0.000802 0.0008018 1.7664 0.1848
#> model 2 0.030323 0.0151615 33.4032 7.29e-14 ***
#> type:model 2 0.001421 0.0007106 1.5656 0.2106
#> Residuals 310 0.140707 0.0004539
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Analysis of Variance Table
#>
#> Response: diff.mean
#> Df Sum Sq Mean Sq F value Pr(>F)
#> model 2 0.31826 0.159129 54.5990 < 2.2e-16 ***
#> type 1 0.05096 0.050963 17.4861 4.871e-05 ***
#> model:type 2 0.02021 0.010103 3.4663 0.03372 *
#> Residuals 152 0.44300 0.002914
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> # A tibble: 0 × 0
#>
#> Wilcoxon signed rank test with continuity correction
#>
#> data: diff.mean
#> V = 52, p-value = 0.09983
#> alternative hypothesis: true location is not equal to 0
#>
#> Wilcoxon signed rank exact test
#>
#> data: diff.mean
#> V = 0, p-value = 0.0009766
#> alternative hypothesis: true location is not equal to 0
#>
#> Wilcoxon signed rank test with continuity correction
#>
#> data: diff.mean
#> V = 56, p-value = 8.088e-10
#> alternative hypothesis: true location is not equal to 0
#>
#> Kruskal-Wallis rank sum test
#>
#> data: diff.mean and type
#> Kruskal-Wallis chi-squared = 9.7565, df = 1, p-value = 0.001787
#> # A tibble: 3 × 3
#> group1 group2 p.value
#> <chr> <chr> <dbl>
#> 1 LTM Biased 0.00000826
#> 2 Meta-RL Biased 0.00230
#> 3 Meta-RL LTM 0.341
| statistic | p.value | parameter | method |
|---|---|---|---|
| 123.9622 | 0 | 2 | Kruskal-Wallis rank sum test |
| statistic | p.value | parameter | method |
|---|---|---|---|
| 229.5472 | 0 | 3 | Kruskal-Wallis rank sum test |
| group1 | group2 | p.value |
|---|---|---|
| LTM | Biased | 0.0000000 |
| Meta-RL | Biased | 0.0000000 |
| Meta-RL | LTM | 0.1393724 |
| RL | Biased | 0.0000000 |
| RL | LTM | 0.6244517 |
| RL | Meta-RL | 1.0000000 |
| group1 | group2 | p.value |
|---|---|---|
| LTM | Biased | 0.0000000 |
| Meta-RL | Biased | 0.0000000 |
| Meta-RL | LTM | 0.0000000 |
| RL | Biased | 0.0000000 |
| RL | LTM | 0.0025631 |
| RL | Meta-RL | 0.0537204 |
| term | df | sumsq | meansq | statistic | p.value |
|---|---|---|---|---|---|
| setSize | 1 | 0.2487593 | 0.2487593 | 25.016136 | 0.0000010 |
| type | 1 | 0.0559410 | 0.0559410 | 5.625629 | 0.0183223 |
| model | 2 | 0.2206534 | 0.1103267 | 11.094856 | 0.0000224 |
| setSize:type | 1 | 0.0305314 | 0.0305314 | 3.070350 | 0.0807405 |
| setSize:model | 2 | 0.3386820 | 0.1693410 | 17.029545 | 0.0000001 |
| type:model | 2 | 0.0211612 | 0.0105806 | 1.064023 | 0.3463464 |
| setSize:type:model | 2 | 0.0289131 | 0.0144566 | 1.453806 | 0.2352992 |
| Residuals | 304 | 3.0229615 | 0.0099440 | NA | NA |
It is difficult to assess what the model fits are capturing without
examining the specific paramter sets more carefully or deducing if
membership in a particular model group predicts some other cognitve or
learning aspects of the subjects.A summary of the parameter data
follows.
First, for the cohort of subjects
Figure 14.
Figure 14.
| variable | mean | median |
|---|---|---|
| alpha | 0.1500000 | 0.1500000 |
| egs | 0.3000000 | 0.3000000 |
| bll | 0.5500000 | 0.5500000 |
| imag | 0.2750000 | 0.2500000 |
| ans | 0.3000000 | 0.3000000 |
| bias | 0.3713112 | 0.3214167 |
Some specific plans are to estimate the three LTM parameters for all 83 participants and see if they are related to WM, PSS measures. Also, how are the parameters related to the “separation” between s3 and s6?
Some more specific things to test might be effect of delay between stimulus presentations. ### What are the differences in learning type interms of behavioral outcomes in other tasks?
These plots show group effects for uCLIMB subjects only in python and OLCTS measures and behavioral predictors.
We have 3Back and PSS for a large majority of participants - what are the group differences if any in these outcomes based on model fit?
Chantel’s request: combine language and programming measures and compare groups.